Skip to main content

Chilli's group workspace

<null>

What makes this group special?
Tags

balmy-universe-11593

Notes
State
Finished
Start time
June 26th, 2024 2:12:41 PM
Runtime
14m 21s
Tracked hours
14m 6s
Run path
eleutherai/neox/8jmpr9w7
OS
Linux-5.19.17-coreweave-x86_64-with-glibc2.17
Python version
3.8.19
Git repository
git clone https://lintangsutawika:@github.com/EleutherAI/gpt-neox.git
Git state
git checkout -b "balmy-universe-11593"
Command
train.py --local_rank=0 --deepspeed_config eyJ0cmFpbl9iYXRjaF9zaXplIjogNjQsICJ0cmFpbl9taWNyb19iYXRjaF9zaXplX3Blcl9ncHUiOiA4LCAib3B0aW1pemVyIjogeyJ0eXBlIjogIkFkYW0iLCAicGFyYW1zIjogeyJiZXRhcyI6IFswLjksIDAuOTVdLCAiZXBzIjogMWUtMDh9fSwgImZwMTYiOiB7ImZwMTYiOiB0cnVlLCAiZW5hYmxlZCI6IHRydWUsICJsb3NzX3NjYWxlIjogMCwgImxvc3Nfc2NhbGVfd2luZG93IjogMTAwMCwgImh5c3RlcmVzaXMiOiAyLCAibWluX2xvc3Nfc2NhbGUiOiAxfSwgInplcm9fb3B0aW1pemF0aW9uIjogeyJzdGFnZSI6IDEsICJhbGxnYXRoZXJfcGFydGl0aW9ucyI6IHRydWUsICJhbGxnYXRoZXJfYnVja2V0X3NpemUiOiAxMjYwMDAwMDAwLCAib3ZlcmxhcF9jb21tIjogdHJ1ZSwgInJlZHVjZV9zY2F0dGVyIjogdHJ1ZSwgInJlZHVjZV9idWNrZXRfc2l6ZSI6IDEyNjAwMDAwMDAsICJjb250aWd1b3VzX2dyYWRpZW50cyI6IHRydWUsICJjcHVfb2ZmbG9hZCI6IGZhbHNlfX0= --megatron_config eyJ0cmFpbl9iYXRjaF9zaXplIjogNjQsICJ0cmFpbl9taWNyb19iYXRjaF9zaXplX3Blcl9ncHUiOiA4LCAib3B0aW1pemVyIjogeyJ0eXBlIjogIkFkYW0iLCAicGFyYW1zIjogeyJiZXRhcyI6IFswLjksIDAuOTVdLCAiZXBzIjogMWUtMDh9fSwgImZwMTYiOiB7ImZwMTYiOiB0cnVlLCAiZW5hYmxlZCI6IHRydWUsICJsb3NzX3NjYWxlIjogMCwgImxvc3Nfc2NhbGVfd2luZG93IjogMTAwMCwgImh5c3RlcmVzaXMiOiAyLCAibWluX2xvc3Nfc2NhbGUiOiAxfSwgInplcm9fb3B0aW1pemF0aW9uIjogeyJzdGFnZSI6IDEsICJhbGxnYXRoZXJfcGFydGl0aW9ucyI6IHRydWUsICJhbGxnYXRoZXJfYnVja2V0X3NpemUiOiAxMjYwMDAwMDAwLCAib3ZlcmxhcF9jb21tIjogdHJ1ZSwgInJlZHVjZV9zY2F0dGVyIjogdHJ1ZSwgInJlZHVjZV9idWNrZXRfc2l6ZSI6IDEyNjAwMDAwMDAsICJjb250aWd1b3VzX2dyYWRpZW50cyI6IHRydWUsICJjcHVfb2ZmbG9hZCI6IGZhbHNlfSwgInByZWNpc2lvbiI6ICJmcDE2IiwgIm51bV9sYXllcnMiOiAyLCAiaGlkZGVuX3NpemUiOiAyNTYsICJudW1fYXR0ZW50aW9uX2hlYWRzIjogNCwgInNlcV9sZW5ndGgiOiAyMDQ4LCAibWF4X3Bvc2l0aW9uX2VtYmVkZGluZ3MiOiAyMDQ4LCAicG9zX2VtYiI6ICJyb3RhcnkiLCAibm9fd2VpZ2h0X3R5aW5nIjogdHJ1ZSwgImF0dGVudGlvbl9jb25maWciOiBbImdsb2JhbCIsICJnbG9iYWwiXSwgInNwYXJzaXR5X2NvbmZpZyI6IHt9LCAic2NhbGVkX3VwcGVyX3RyaWFuZ19tYXNrZWRfc29mdG1heF9mdXNpb24iOiB0cnVlLCAiYmlhc19nZWx1X2Z1c2lvbiI6IHRydWUsICJpbml0X21ldGhvZF9zdGQiOiAwLjA4LCAicm90YXJ5X3BjdCI6IDAuMjUsICJncHRfal9yZXNpZHVhbCI6IHRydWUsICJscl9kZWNheV9zdHlsZSI6ICJjb25zdGFudCIsICJ3YXJtdXAiOiAwLCAib3B0aW1pemVyX3R5cGUiOiAiQWRhbSIsICJ6ZXJvX3N0YWdlIjogMSwgInplcm9fcmVkdWNlX3NjYXR0ZXIiOiB0cnVlLCAiemVyb19jb250aWd1b3VzX2dyYWRpZW50cyI6IHRydWUsICJ6ZXJvX3JlZHVjZV9idWNrZXRfc2l6ZSI6IDEyNjAwMDAwMDAsICJ6ZXJvX2FsbGdhdGhlcl9idWNrZXRfc2l6ZSI6IDEyNjAwMDAwMDAsICJsciI6IDAuMDEsICJ0b2tlbml6ZXJfdHlwZSI6ICJIRlRva2VuaXplciIsICJkYXRhX3BhdGgiOiAiL21udC9zc2QtMS9saW50YW5nLzA5LW11cC1uZW94L2RhdGEvZW53aWs4L2Vud2lrOF90ZXh0X2RvY3VtZW50IiwgImRhdGFfaW1wbCI6ICJtbWFwIiwgImNvbmZpZ19maWxlcyI6IHsiY29vcmRfY2hlY2tfbXVwLnltbCI6ICJ7XG4gICMgcGFyYWxsZWxpc20gc2V0dGluZ3NcbiAgXCJwaXBlX3BhcmFsbGVsX3NpemVcIjogMSxcbiAgXCJtb2RlbF9wYXJhbGxlbF9zaXplXCI6IDEsXG5cbiAgIyBtb2RlbCBzZXR0aW5nc1xuICBcIm51bV9sYXllcnNcIjogMixcbiAgXCJudW1fYXR0ZW50aW9uX2hlYWRzXCI6IDQsXG4gIFwic2VxX2xlbmd0aFwiOiAyMDQ4LFxuICBcIm1heF9wb3NpdGlvbl9lbWJlZGRpbmdzXCI6IDIwNDgsXG4gIFwicG9zX2VtYlwiOiBcInJvdGFyeVwiLFxuICBcInJvdGFyeV9wY3RcIjogMC4yNSxcbiAgXCJub193ZWlnaHRfdHlpbmdcIjogdHJ1ZSxcbiAgXCJncHRfal9yZXNpZHVhbFwiOiB0cnVlLFxuICBcIm91dHB1dF9sYXllcl9wYXJhbGxlbGlzbVwiOiBcImNvbHVtblwiLFxuXG4gICMgdGhlc2Ugc2hvdWxkIHByb3ZpZGUgc29tZSBzcGVlZHVwIGJ1dCB0YWtlcyBhIHdoaWxlIHRvIGJ1aWxkLCBzZXQgdG8gdHJ1ZSBpZiBkZXNpcmVkXG4gIFwic2NhbGVkX3VwcGVyX3RyaWFuZ19tYXNrZWRfc29mdG1heF9mdXNpb25cIjogdHJ1ZSxcbiAgXCJiaWFzX2dlbHVfZnVzaW9uXCI6IHRydWUsXG5cbiAgIyAjIGluaXQgbWV0aG9kc1xuICAjIFwiaW5pdF9tZXRob2RcIjogXCJzbWFsbF9pbml0XCIsXG4gICMgXCJvdXRwdXRfbGF5ZXJfaW5pdF9tZXRob2RcIjogXCJ3YW5nX2luaXRcIixcblxuICAjIGluaXQgbWV0aG9kc1xuICBcImluaXRfbWV0aG9kXCI6IFwibm9ybWFsXCIsXG4gIFwib3V0cHV0X2xheWVyX2luaXRfbWV0aG9kXCI6IFwic2NhbGVkX25vcm1hbFwiLFxuXG4gICMgb3B0aW1pemVyIHNldHRpbmdzXG4gIFwib3B0aW1pemVyXCI6IHtcbiAgICBcInR5cGVcIjogXCJBZGFtXCIsXG4gICAgXCJwYXJhbXNcIjoge1xuICAgICAgXCJiZXRhc1wiOiBbMC45LCAwLjk1XSxcbiAgICAgIFwiZXBzXCI6IDEuMGUtOCxcbiAgICB9XG4gIH0sXG4gIFwibHJfZGVjYXlfc3R5bGVcIjogY29uc3RhbnQsXG4gIFwid2FybXVwXCI6IDAsXG5cbiAgIyBmb3IgYWxsIHplcm9fb3B0aW1pemF0aW9uIG9wdGlvbnMsIHNlZSBodHRwczovL3d3dy5kZWVwc3BlZWQuYWkvZG9jcy9jb25maWctanNvbi8jemVyby1vcHRpbWl6YXRpb25zLWZvci1mcDE2LXRyYWluaW5nXG4gICBcInplcm9fb3B0aW1pemF0aW9uXCI6IHtcbiAgICBcInN0YWdlXCI6IDEsXG4gICAgXCJhbGxnYXRoZXJfcGFydGl0aW9uc1wiOiB0cnVlLFxuICAgIFwiYWxsZ2F0aGVyX2J1Y2tldF9zaXplXCI6IDEyNjAwMDAwMDAsXG4gICAgXCJvdmVybGFwX2NvbW1cIjogdHJ1ZSxcbiAgICBcInJlZHVjZV9zY2F0dGVyXCI6IHRydWUsXG4gICAgXCJyZWR1Y2VfYnVja2V0X3NpemVcIjogMTI2MDAwMDAwMCxcbiAgICBcImNvbnRpZ3VvdXNfZ3JhZGllbnRzXCI6IHRydWUsXG4gICAgXCJjcHVfb2ZmbG9hZFwiOiBmYWxzZVxuICB9LFxuXG4gICMgYmF0Y2ggLyBkYXRhIHNldHRpbmdzXG4gIFwidHJhaW5fbWljcm9fYmF0Y2hfc2l6ZV9wZXJfZ3B1XCI6IDgsXG4gIFwiZ3JhZGllbnRfYWNjdW11bGF0aW9uX3N0ZXBzXCI6IDEsXG4gIFwiZGF0YV9pbXBsXCI6IFwibW1hcFwiLFxuICBcIm51bV93b3JrZXJzXCI6IDEsXG5cbiAgIyBhY3RpdmF0aW9uIGNoZWNrcG9pbnRpbmdcbiAgXCJjaGVja3BvaW50X2FjdGl2YXRpb25zXCI6IHRydWUsXG4gIFwiY2hlY2twb2ludF9udW1fbGF5ZXJzXCI6IDEsXG4gIFwicGFydGl0aW9uX2FjdGl2YXRpb25zXCI6IHRydWUsXG4gIFwic3luY2hyb25pemVfZWFjaF9sYXllclwiOiB0cnVlLFxuXG4gICMgcmVndWxhcml6YXRpb25cbiAgXCJncmFkaWVudF9jbGlwcGluZ1wiOiAxLjAsXG4gIFwid2VpZ2h0X2RlY2F5XCI6IDAuMCxcbiAgXCJoaWRkZW5fZHJvcG91dFwiOiAwLFxuICBcImF0dGVudGlvbl9kcm9wb3V0XCI6IDAsXG5cbiAgIyBwcmVjaXNpb24gc2V0dGluZ3NcbiAgIyBcInByZWNpc2lvblwiOiBcImZwMzJcIixcbiAgXCJmcDE2XCI6IHtcbiAgICBcImZwMTZcIjogdHJ1ZSxcbiAgICBcImVuYWJsZWRcIjogdHJ1ZSxcbiAgICBcImxvc3Nfc2NhbGVcIjogMCxcbiAgICBcImxvc3Nfc2NhbGVfd2luZG93XCI6IDEwMDAsXG4gICAgXCJoeXN0ZXJlc2lzXCI6IDIsXG4gICAgXCJtaW5fbG9zc19zY2FsZVwiOiAxXG4gIH0sXG5cbiAgIyBtaXNjLiB0cmFpbmluZyBzZXR0aW5nc1xuICBcInRyYWluX2l0ZXJzXCI6IDEwLFxuICBcImxvZ19pbnRlcnZhbFwiOiAxLFxuICBcImRpc3RyaWJ1dGVkX2JhY2tlbmRcIjogXCJuY2NsXCIsXG5cbiAgXCJjb29yZF9jaGVja1wiOiB0cnVlLFxuICBcImNvb3JkX2NoZWNrX25zdGVwc1wiOiAxMCxcbiAgXCJjb29yZF9jaGVja19uc2VlZHNcIjogMyxcbiAgXCJ1c2VfbXVwXCI6IHRydWUsXG4gICMgYmFzZSBsclxuICBcIm11cF9sclwiOiAwLjAxLFxuICAjIGJhc2Ugc2lnbWFcbiAgXCJtdXBfc3RkXCI6IDAuMDgsXG4gICMgYmFzZSBzaXplXG4gIFwibXVwX2RfbW9kZWxfYmFzZVwiOiAyNTYsXG4gIFwibXVwX2hpZGRlbl9zaXplXCI6IDI1NixcblxuICBcInRva2VuaXplcl90eXBlXCI6IFwiSEZUb2tlbml6ZXJcIixcbiAgXCJ2b2NhYi1maWxlXCI6IFwiL21udC9zc2QtMS9saW50YW5nLzA5LW11cC1uZW94LzIwQl90b2tlbml6ZXIuanNvblwiLFxuICBcImRhdGEtcGF0aFwiOiBcIi9tbnQvc3NkLTEvbGludGFuZy8wOS1tdXAtbmVveC9kYXRhL2Vud2lrOC9lbndpazhfdGV4dF9kb2N1bWVudFwiLFxuICBcIm11cF9zYXZlXCI6IFwiL21udC9zc2QtMS9saW50YW5nLzA5LW11cC1uZW94L211cF9yZXN1bHRzXCIsXG5cbn1cbiJ9LCAiYmF0Y2hfc2l6ZSI6IDgsICJ0cmFpbl9pdGVycyI6IDEwLCAidm9jYWJfZmlsZSI6ICIvbW50L3NzZC0xL2xpbnRhbmcvMDktbXVwLW5lb3gvMjBCX3Rva2VuaXplci5qc29uIiwgIm51bV93b3JrZXJzIjogMSwgIndlaWdodF9kZWNheSI6IDAuMCwgImNoZWNrcG9pbnRfYWN0aXZhdGlvbnMiOiB0cnVlLCAic3luY2hyb25pemVfZWFjaF9sYXllciI6IHRydWUsICJwYXJ0aXRpb25fYWN0aXZhdGlvbnMiOiB0cnVlLCAiZHluYW1pY19sb3NzX3NjYWxlIjogdHJ1ZSwgInVzZV9tdXAiOiB0cnVlLCAibXVwX3NhdmUiOiAiL21udC9zc2QtMS9saW50YW5nLzA5LW11cC1uZW94L211cF9yZXN1bHRzIiwgIm11cF9sciI6IDAuMDEsICJtdXBfc3RkIjogMC4wOCwgIm11cF9oaWRkZW5fc2l6ZSI6IDI1NiwgImNvb3JkX2NoZWNrIjogdHJ1ZSwgImNvb3JkX2NoZWNrX25zZWVkcyI6IDMsICJtdXBfd2lkdGhfbXVsdGlwbGllciI6IDEuMCwgInBpcGVfcGFyYWxsZWxfc2l6ZSI6IDEsICJ3b3JsZF9zaXplIjogMSwgImxvZ19pbnRlcnZhbCI6IDEsICJ0ZXh0X2dlbl90eXBlIjogInVuY29uZGl0aW9uYWwiLCAibG9jYWxfcmFuayI6IDAsICJyYW5rIjogMCwgInVzZXJfc2NyaXB0IjogInRyYWluLnB5IiwgImdsb2JhbF9udW1fZ3B1cyI6IDh9
System Hardware
CPU count48
Logical CPU count 96
GPU count8
GPU typeNVIDIA A40
W&B CLI Version
0.16.6
Config

Config parameters are your model's inputs. Learn more

  • {} 262 keys
    • null
    • "gelu"
    • null
    • false
    • 1,000
    • null
    • false
    • [] 2 items
      • "global"
      • "global"
    • 0
    • false
    • null
    • null
    • null
    • 8
    • null
    • false
    • true
    • false
    • null
    • true
    • null
    • false
    • 1
    • "linear"
    • false
    • 1
    • null
    • null
    • null
    • null
    • {} 1 key
      • "{ # parallelism settings "pipe_parallel_size": 1, "model_parallel_size": 1, # model settings "num_layers": 2, "num_attention_heads": 4, "seq_length": 2048, "max_position_embeddings": 2048, "pos_emb": "rotary", "rotary_pct": 0.25, "no_weight_tying": true, "gpt_j_residual": true, "output_layer_parallelism": "column", # these should provide some speedup but takes a while to build, set to true if desired "scaled_upper_triang_masked_softmax_fusion": true, "bias_gelu_fusion": true, # # init methods # "init_method": "small_init", # "output_layer_init_method": "wang_init", # init methods "init_method": "normal", "output_layer_init_method": "scaled_normal", # optimizer settings "optimizer": { "type": "Adam", "params": { "betas": [0.9, 0.95], "eps": 1.0e-8, } }, "lr_decay_style": constant, "warmup": 0, # for all zero_optimization options, see https://www.deepspeed.ai/docs/config-json/#zero-optimizations-for-fp16-training "zero_optimization": { "stage": 1, "allgather_partitions": true, "allgather_bucket_size": 1260000000, "overlap_comm": true, "reduce_scatter": true, "reduce_bucket_size": 1260000000, "contiguous_gradients": true, "cpu_offload": false }, # batch / data settings "train_micro_batch_size_per_gpu": 8, "gradient_accumulation_steps": 1, "data_impl": "mmap", "num_workers": 1, # activation checkpointing "checkpoint_activations": true, "checkpoint_num_layers": 1, "partition_activations": true, "synchronize_each_layer": true, # regularization "gradient_clipping": 1.0, "weight_decay": 0.0, "hidden_dropout": 0, "attention_dropout": 0, # precision settings # "precision": "fp32", "fp16": { "fp16": true, "enabled": true, "loss_scale": 0, "loss_scale_window": 1000, "hysteresis": 2, "min_loss_scale": 1 }, # misc. training settings "train_iters": 10, "log_interval": 1, "distributed_backend": "nccl", "coord_check": true, "coord_check_nsteps": 10, "coord_check_nseeds": 3, "use_mup": true, # base lr "mup_lr": 0.01, # base sigma "mup_std": 0.08, # base size "mup_d_model_base": 256, "mup_hidden_size": 256, "tokenizer_type": "HFTokenizer", "vocab-file": "/mnt/ssd-1/lintang/09-mup-neox/20B_tokenizer.json", "data-path": "/mnt/ssd-1/lintang/09-mup-neox/data/enwik8/enwik8_text_document", "mup_save": "/mnt/ssd-1/lintang/09-mup-neox/mup_results", } "
    • false
    • true
    • 3
    • 10
    • true
    • null
    • null
    • 0
    • null
    • "mmap"
    • "/mnt/ssd-1/lintang/09-mup-neox/data/enwik8/enwik8_text_document"
    • null
    • false
    • null
    • true
    • 46 ... 95
      96 ... 145
      146 ... 195
      196 ... 245
      246 ... 257
    • {} 8 keys
      • 1,260,000,000
      • true
      • 1
    Summary

    Summary metrics are your model's outputs. Learn more

    • {} 0 keys